Computation and Language 4
♻ ☆ No-Skim: Towards Efficiency Robustness Evaluation on Skimming-based Language Models
To reduce the computation cost and the energy consumption in large language
models (LLM), skimming-based acceleration dynamically drops unimportant tokens
of the input sequence progressively along layers of the LLM while preserving
the tokens of semantic importance. However, our work for the first time reveals
the acceleration may be vulnerable to Denial-of-Service (DoS) attacks. In this
paper, we propose No-Skim, a general framework to help the owners of
skimming-based LLM to understand and measure the robustness of their
acceleration scheme. Specifically, our framework searches minimal and
unnoticeable perturbations at character-level and token-level to generate
adversarial inputs that sufficiently increase the remaining token ratio, thus
increasing the computation cost and energy consumption. We systematically
evaluate the vulnerability of the skimming acceleration in various LLM
architectures including BERT and RoBERTa on the GLUE benchmark. In the worst
case, the perturbation found by No-Skim substantially increases the running
cost of LLM by over 145% on average. Moreover, No-Skim extends the evaluation
framework to various scenarios, making the evaluation conductible with
different level of knowledge.
♻ ☆ Teaching Specific Scientific Knowledge into Large Language Models through Additional Training
Through additional training, we explore embedding specialized scientific
knowledge into the Llama 2 Large Language Model (LLM). Key findings reveal that
effective knowledge integration requires reading texts from multiple
perspectives, especially in instructional formats. We utilize text augmentation
to tackle the scarcity of specialized texts, including style conversions and
translations. Hyperparameter optimization proves crucial, with different size
models (7b, 13b, and 70b) reasonably undergoing additional training. Validating
our methods, we construct a dataset of 65,000 scientific papers. Although we
have succeeded in partially embedding knowledge, the study highlights the
complexities and limitations of incorporating specialized information into
LLMs, suggesting areas for further improvement.
comment: added token information for some texts, and fixed typo
♻ ☆ LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment
Shihan Dou, Enyu Zhou, Yan Liu, Songyang Gao, Jun Zhao, Wei Shen, Yuhao Zhou, Zhiheng Xi, Xiao Wang, Xiaoran Fan, Shiliang Pu, Jiang Zhu, Rui Zheng, Tao Gui, Qi Zhang, Xuanjing Huang
Supervised fine-tuning (SFT) is a crucial step for large language models
(LLMs), enabling them to align with human instructions and enhance their
capabilities in downstream tasks. When the models are required to align with a
broader range of downstream tasks, or there is a desire to notably improve the
performance on a specific task, a substantial increase in fine-tuning data
often emerges as the solution. However, we find that large-scale increases in
instruction data can disrupt the world knowledge previously stored in the LLMs,
i.e., world knowledge forgetting. In this paper, we introduce LoRAMoE to
address the above challenge. The LoRAMoE is a plugin version of Mixture of
Experts (MoE). The plugin form ensures the integrity of world knowledge by
freezing the backbone model during the training phase. We then propose the use
of localized balancing constraints to coordinate parts of experts for task
utilization, meanwhile enabling other experts to fully leverage the world
knowledge stored in the models. Experimental results demonstrate that LoRAMoE
can reasonably coordinate experts based on data type during inference, and even
dramatically increasing instruction data does not result in knowledge
forgetting. Moreover, LoRAMoE provides additional benefits for the performance
of downstream tasks, indicating the potential of our approach for multi-task
learning.
comment: 17 pages, 7 figures
♻ ☆ YUAN 2.0: A Large Language Model with Localized Filtering-based Attention
Shaohua Wu, Xudong Zhao, Shenling Wang, Jiangang Luo, Lingjun Li, Xi Chen, Bing Zhao, Wei Wang, Tong Yu, Rongguo Zhang, Jiahua Zhang, Chao Wang
In this work, we develop and release Yuan 2.0, a series of large language
models with parameters ranging from 2.1 billion to 102.6 billion. The Localized
Filtering-based Attention (LFA) is introduced to incorporate prior knowledge of
local dependencies of natural language into Attention. A data filtering and
generating system is presented to build pre-training and fine-tuning dataset in
high quality. A distributed training method with non-uniform pipeline parallel,
data parallel, and optimizer parallel is proposed, which greatly reduces the
bandwidth requirements of intra-node communication, and achieves good
performance in large-scale distributed training. Yuan 2.0 models display
impressive ability in code generation, math problem-solving, and chatting
compared with existing models. The latest version of YUAN 2.0, including model
weights and source code, is accessible at Github.